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Abstract 

Various regulatory elements in messenger RNAs (mRNAs) carrying the secondary structure play important roles in a wide range of 
expression processes. Numerous recent works have focused on the discovery of these functional elements that contain the conserved 
mRNA structures. However, to date, regions with high structural stability have been largely overlooked. In this study, we defined high 
stability regions (HSRs) in the coding sequences (CDSs) in bacteria based on the normalized folding free energy. We found that CDSs 
had high number of HSRs, and these HSRs showed high structural context robustness compared with random sequences, indicating a 
direct selective constraint imposed on HSRs. A reduced ribosome speed was detected near the start position of HSR, implying a 
possibility that HSR acted as obstacle to drive translational pausing that coordinated protein synthesis. Interestingly, we found that 
genes with high HSR density were enriched in the processes of translation, protein folding, and cell division. In addition, essential 
genes exhibited higher HSR density than nonessential genes. Overall, our study presented the previously unappreciated correlation 
between the number variation of HSRs and cellular processes. 
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Introduction 

Messenger RNAs (mRNAs) with reverse complements have 
the potential to fold into local secondary structures that 
often act as regulatory elements in a series of cellular pro- 
cesses (Wan et al. 201 1 ; Dethoff et al. 201 2), including trans- 
lation initiation (Kudla et al. 2009; Gu et al. 201 0; Scharff et al. 
201 1 ; Keller et al. 2012) and localization of mRNA or protein 
(Power et al. 2004; Clarke and Clark 2010). The mRNA sec- 
ondary structure participates in cellular processes in several 
ways. One possibility is that the secondary structure interacts 
with RNA-binding factors, with the structural specificity to 
regulate the downstream processes (Dethoff et al. 2012). In 
this scenario, the specific structure is assumed to be conserved 
during evolution because the sequences serving the same 
function are generally accepted to share the same structure. 
In accordance with this hypothesis, numerous studies have 
performed the methods combined with phylogenetic analyses 
to identify the functional elements that contain the conserved 
mRNA structures. For example, a conserved stem-loop struc- 
ture within the ^-untranslated region of RNase E mRNA has 



been reported to regulate the degradation of RNAase E gene 
transcript (Diwa et al. 2000). By targeting the transcript that 
contains this element, bacterial cells can maintain RNase E 
near its optimal concentration (Diwa et al. 2000). Other ex- 
amples have also been found in the studies of long-range 
interactions (Li et al. 2010), internal entry site structures 
(Lukavsky 2009), and steroid receptor RNA activators 
(Novikova et al. 2012). 

Another way of regulation involving the local structural 
stability of mRNA is elucidated in a few initial reports (Giedroc 
and Cornish 2009; Watts et al. 2009; Gu et al. 2010; Tuller 
et al. 2010; Yu et al. 2011; Tholstrup et al. 2012). mRNA 
regions with high structural stability provide barriers for ribo- 
some movement, causing ribosome pausing (Wen et al. 
2008), which affects a large number of cotranslational pro- 
cesses (Thanaraj and Argos 1996; Cabrita et al. 2010; Siller 
et al. 2010). A stable structure downstream a slippery se- 
quence, for instance, stimulates ribosomal frameshift (Giedroc 
and Cornish 2009). A recent study has shown that both RNA 
pseudoknot and stem-loop structure lead to the frameshift 
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effectively (Yu et al. 201 1), indicating that high structural sta- 
bility instead of the specific structure is required for the frame- 
shift. Recently, Watts et al. (2009) have found that mRNA 
regions encoding interdomain loops of HIV proteins exhibit 
higher structural stability than other regions. Moreover, 
correlation between mRNA structural stability and protein sec- 
ondary structure has been noted (Jia et al. 2004; Luo et al. 
2004). These results suggest that the variation in structural 
stability is connected with protein folding. With this back- 
ground, we inferred that there are some functional elements 
on mRNAs, which maintain high structural stability regardless 
of the structural specificity. Unfortunately, most initial reports 
have focused on functional regions containing conserved sec- 
ondary structures (Fogel et al. 2002; Pedersen et al. 2004; 
Washietl and Hofacker 2004; Meyer and Miklos 2005; Peder- 
sen et al. 2006; Petrillo et al. 2006; Moss et al. 201 1 ; Goodarzi 
et al. 2012); high stability regions (HSRs) and their functions 
have been studied only in a few processes (e.g., protein fold- 
ing). Therefore, in this study, we examined the HSRs of mRNA 
from the genome-wide perspective and investigated the cor- 
relation between the number of HSRs (HSR density) and bio- 
logical processes. 

Materials and Methods 

Data Collection 

Coding Sequences and Orthologs 

Four enterobacteria {Escherichia coli K1 2 MG1 655, Salmonella 
enterica subsp. enterica serovar Typhi CT18, Shigella flexneri 
301, and Yersinia pestis C092) and two nonenteric Gamma- 
proteobacteria {Vibrio cholerae 01 biovar El Tor N16961 and 
Aeromonas hydrophila ATCC 7966) were used in this study. 
Protein-coding sequences (CDS) were downloaded from the 
National Center for Biotechnology Information FTP server 
(ftp://ftp.ncbi.nih.gov/genomes/). Sequences with length 
<200 nucleotides (nt) were excluded. The orthologous rela- 
tionships were obtained from the KEGG database (Kanehisa 
et al. 2008). Only one-to-one orthologs were used. The num- 
bers of CDSs and orthologs are summarized in supplementary 
table S1, Supplementary Material online. 

Ribosome Density 

Ribosome occupancy data sets in E. coli were obtained from 
the work of Li et al. (2012). We averaged the normalized 
ribosome occupancy (normalized by the mean occupancy of 
the corresponding transcript) at the same site of all available 
transcripts to obtain the mean ribosome density. 

Protein Abundance and mRNA Half-life 

Three data sets of protein abundance of E. coli were obtained 
from Lu et al. (2006), Lewis et al. (2010), and Taniguchi et al. 
(2010). For each data set, the values of protein abundance 



were normalized by the mean of data set. We averaged the 
three normalized data sets to obtain an integrated data set. 
mRNA half-life data were derived from the work of Selinger 
et al. (2003). 

Gene Essentiality and Protein-Protein Interaction 

Essential and nonessential genes of E. coli were defined as 
described in Kato and Hashimoto (2007) (302 essential 
genes and 4,139 nonessential genes). The protein-protein in- 
teraction (PPI) data of E. coli were taken from Arifuzzaman 
et al. (2006) (2,667 proteins and 16,050 interacting patterns). 

Local Structure Prediction and Structural Stability 
Calculation 

We used RNAfold in the Vienna RNA Secondary Structure 
Package (Gruber et al. 2008) to predict the local structure 
and calculate the minimum folding free energy (MFE) along 
the CDS using a sliding window with 50 nt in length and a 
step of 1 0 nt. A small window size (50 nt) was used because it 
approximated the length of regions (40 nt) covered by 
ribosome during elongation. To rule out the effect of base 
composition, which strongly affects MFE (Dawson and 
Yamaoto 1999; Mathews et al. 1999), MFE was normalized 
by the base composition of the corresponding sequence. For 
each sequence, we shuffled all nucleotides, while controlling 
for base composition. We repeated this process 100 times to 
obtain 100 random sequences. The MFE of random se- 
quences was calculated using RNAfold. The normalized 
MFE, z-score, was calculated by equation (1). 

Z SCOre m ^ native ~ ^^random 

a 

where mfe native is the free energy of native sequence, and 
nnfe ra ndom and a are the mean and standard deviation of 
the MFE of 100 random sequences, respectively. 

Structure Density 

We used a threshold of -0.65 to define the HSR. A z-score of 
-0.65 means that approximately 60% nucleotides in a 
window are base paired, which was approximately equal to 
the mean percentage of paired sites (62.26%; supplementary 
fig. S1, Supplementary Material online) in E. coli tRNA. To 
reduce false positives, we only considered regions with more 
than two continuous sliding windows in which the z-scores 
were all below the threshold. If the percentage of overlapping 
sites of two adjacent HSRs was higher than 50%, the two 
HSRs were combined. To exclude the effect of sequence 
length on the number of HSRs, we defined the HSR density 
of transcript as the number of HSRs per kilobases (kb). 

Conserved HSRs 

We defined conserved HSRs between E. coli and 5. enterica. 
Considering that insertions and deletions in sequences 
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strongly affect the position of HSR, we first aligned the ortho- 
logs using MUSCLE (Edgar 2004), and then excluded the 
alignments with insertions and deletions >10nt. For each 
HSR in E. coli (HSR-eco), we searched for the homologous 
HSR in the orthologous region in S. enterica (HSR-sty). If 
HSR-sty was found and the percentage of overlapping posi- 
tions between HSR -eco and HSR-sty was higher than 50%, 
the two HSRs were defined as conserved HSRs. 



create a legal structure, struc{HSR'). The relative change of 
MFE was calculated by equation (2). 

r. , ■ r r mfe r - mfe 

Relative change of mfe = — ; — —. — , (2) 

| mfe | 

where mfe' is the MFE of struc(HSR f ) and mfe is the MFE of 
HSR. For each HSR, we generated 50 sets of rFAR-HSR-rBAR, 
and then averaged 50 relative changes of MFE to obtain the 
SCR of HSR. 



Random Sequence and Control 

Mononucleotide shuffling cannot preserve codon usage and 
amino acid sequence of native sequence. Thus, the HSRs we 
defined might result from codon usage or amino acid bias. To 
resolve this issue, we generated 20 random sequences for 
each CDS by shuffling synonymous codons among sites 
with identical amino acids, while maintaining the codon 
usage, amino acid sequence, and GC content. The HSR den- 
sity in the random sequence (rHSR density) was defined as 
described earlier. In such permutation, the signals of codon 
usage and amino acid sequence were preserved. Conse- 
quently, the HSRs caused by these signals were preserved, 
whereas the other HSRs were perturbed. We compared 
rHSR densities among different gene categories to test 
whether our results were affected by codon usage or amino 
acid bias. Additionally, 20 nonfunctional sequences were gen- 
erated for each CDS by shuffling mononucleotides, which 
were used as control. 



Structural Context Robustness of HSR 

The structural context robustness (SCR) indicates an intrinsic 
tendency of subsequence to be structurally indifferent to 
its surrounding sequences and to be resistant to the interrup- 
tions of context (Lee and Kim 2008). For most structural 
elements, the structural specificity or structural stability directly 
affects their functions. Thus, such elements might have high 
SCR, as described by Lee and Kim (2008) and Sewer et al. 
(2005). 

To determine whether natural selection operates on HSRs 
directly, we estimated the SCR of HSR based on the method 
proposed by Lee and Kim (2008). We defined the SCR of HSR 
as the relative change in MFE when HSR was embedded in 
random surrounding sequences. For each HSR, we extracted 
the forward and backward adjacent regions (FAR and BAR, 
whose lengths are equal to the corresponding HSR). The 
random adjacent regions, rFAR and rBAR, were then gener- 
ated by shuffling synonymous codons among sites with iden- 
tical amino acids, while maintaining genomic frequency of 
codons and amino acid sequence. A concatenated sequence, 
rFAR-HSR-rBAR, was created and whose secondary structure 
was predicted by RNAfold. The portion of such structure cor- 
responding to the HSR index was extracted and modified to 



Gene Classification 

The classifications of gene products in E. coli were down- 
loaded from the GenProtEC database (Serres et al. 2004). 
We compared HSR densities among the first five classifica- 
tions: metabolism, information transfer, regulation and trans- 
port, and cell processes. For details, we used Gene Ontology 
(GO) (Ashburner et al. 2000) to show the correlation between 
biological process and HSR density. GO enrichment analysis 
was performed using DAVID Bioinformatics Resources (Da 
Wei Huang and Lempicki 2008). 

Results 

Nonconserved Local Structure with High Stability 

Recently, a stable structure located in the approximately 
30-80 nt interval downstream of the start codon (supplemen- 
tary fig. S2, Supplementary Material online) has been reported 
and assumed to be correlated with translational control (Tuller 
et al. 201 1). Here, we analyzed the conservation of this func- 
tional structure between E. coli and 5. enterica. We found that 
very similar sequences showed vastly different local structures 
in the 30-80 nt interval (fig. ^A) I suggesting that local struc- 
ture in this region was nonconserved. To verify this finding, we 
calculated the correlation between structure distance (calcu- 
lated by RNAdistance; Gruber et al. 2008) and sequence iden- 
tity. As shown in figure ^B I a negative correlation was found 
(R= -0.429, P< 2.2 x 10~ 16 , fig. IB). The structure distance 
linearly decreased as sequence identity increased from 80% 
to 100%. Additionally, we checked this relationship using ran- 
dom sequences generated by shuffling synonymous codons, 
while maintaining codon usage, amino acid sequence, and 
sequence distance of orthologs. A similar pattern was found 
(supplementary fig. S3, Supplementary Material online). The 
slope in the random sequence was similar to that in the native 
sequence (-3.49 for random sequence and -3.56 for native 
sequence). Overall, these results indicated that no appreciable 
evolutionary constraints existed for maintaining the conserva- 
tion of the local structure in the 30-80 nt interval. The results 
supported the claim that structural stability (measured by MFE) 
in this region is under selection and correlated with the speed 
of translation elongation (Tuller et al. 2011). These findings 
also led us to survey other potential functional elements with 
high structural stability on mRNAs. 
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Fig. 1. — Conservation analysis of a representative local structure. (A) z-scores along two very similar sequences are shown, z-scores near the 30-1 00 nt 
interval are significantly lower than 0, suggesting an evolutionary pressure to maintain the high structural stability. However, the structures of two 
sequences in this region vary vastly. (B) Correlation between structure distance and sequence identity. A significant linear correlation is shown when 
sequence identity >0.85. 



HSRs Are Common in CDSs 

We surveyed HSRs in six bacteria (Materials and Methods). 
Figures 2 and 3 showed the representative patterns inferred 
from E. coli. Similar patterns in other species were shown in 
supplementary figures S4 and S5, Supplementary Material 
online. As seen in figure 2, 92.8% sequences contained at 
least one HSR and 78.3% sequences contained three or more 
HSRs, whereas the corresponding percentages in the control 
were 55.4% and 9.7%, respectively (fig. 2). The significantly 
higher HSR density in the native sequence than that in the 
control (Wilcoxon test, all P values in six species were 
<2.2 x 10~ 15 ) suggested a selective constraint on CDSs to 
maintain the high number of HSRs. Additionally, we predicted 
the conserved secondary structures using RNAz (Gruber et al. 
2007) with the same widow size and step used for HSR pre- 
diction. The result showed that the percentage of HSRs con- 
taining the conserved structures was low (12.9% and 31 .7% 
on average, when the fractions of overlapping sites were 1 .0 
and 0.5, respectively, fig. 4), suggesting that a considerable 
fraction of functional elements would be ignored when we 
only considered ones with conserved secondary structure. 

The locations of HSRs along CDSs were investigated as 
well. The result showed that a higher number of HSRs were 
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Fig. 2. — The distribution of HSR density in Escherichia coli. Native: HSR 
densities in native sequences. Control: HSR densities in nonfunctional se- 
quences, which were generated by shuffling mononucleotides of corre- 
sponding native sequence. rHSR: HSR densities in random sequences, 
generated by shuffling synonymous codons among sites with identical 
amino acids, while maintaining codon usage, GC content, and amino 
acid sequence. 



located in the 5 r or 3 r coding region compared with other 
regions (fig. 3). More than 12% sequences exhibited HSRs 
in the first or last 90 nt, whereas the mean frequency de- 
creased to approximately 3.7% in other regions. 5'-HSRs 
might exert an effect on translation initiation and early stage 
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Fig. 3. — The locations of HSRs along transcripts in Escherichia coli. (A, B) Along the CDSs, the values of every five sliding windows were combined. For 
example, 6.4% at position 90 indicates that there are 6.4% sliding windows containing HSRs in the region from the start codon to the downstream 90 nt. 
(C, D) The relative frequencies of HSRs in the first five windows are shown. 
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Fig. 5. — The distribution of SCR values of HSRs in Escherichia coli. 
Native: native HSRs; Random: random HSRs (rHSRs) generated by shuffling 
native HSRs, while maintaining codon usage and amino acid sequence. In 
addition, the MFE of rHSR is similar to that of native HSR. 



Unsurprisingly, HSR density was strongly dependent on the 
threshold. However, similar results were obtained when 
examining different thresholds (supplementary figs. S4 and 
S5, Supplementary Material online). 



of translation elongation, and 3'-HSRs might be associated 
with translation termination or RNA decay. 

To ascertain the robustness of our results, we first investi- 
gate the effect of codon usage and amino acid bias using 
random sequences (Materials and Methods). Because of the 
extensive amount of computations needed for predicting so 
many secondary structures, we only calculated the rHSR den- 
sity in E. coli. As expected, we found that HSR density was 
significantly higher than rHSR density (Wilcoxon test, 
P< 2.2 x 1 0~ 15 , fig. 2), suggesting a selection for HSR density 
even when controlling for codon usage and amino acid bias. 
We then tested the effects of thresholds. Other thresholds 
(e.g., -0.45, which was the lowest mean value of the 
z-score of transcripts) were applied to define the HSRs. 



Structural Robustness of HSR 

In previous section, we defined HSRs using a low z-score 
value, which, however, did not guarantee that there is a 
direct selective constraint imposed on HSR to maintain high 
structural stability. To resolve this issue, we compared the SCR 
of native HSR with that of rHSR that had the same codon 
usage, amino acid sequence, and similar MFE to HSR (located 
in the range MFE HS r± 10% MFE HS r)- We proposed the null 
hypothesis: if the SCR of HSRs is equal to that of rHSR, HSRs 
are a by-product of selection for other factors that affect struc- 
tural stability. We found that the SCR values of HSRs approx- 
imated a normal distribution with a mean of 0.34 (fig. 5 and 
supplementary fig. S6, Supplementary Material online). 
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Fig. 6. — Mean ribosome density around the start of HSRs. 0 in hor- 
izontal axis: the start position of HSRs. Top: genes with expression levels at 
the top 30%. All: all available CDSs. Two center positions of ribosome 
accumulation are indicated by dashes. 



Table 1 

HSR Densities in Different Gene Categories 



Category 



Genes 



Native 3 



Control 1 



Control 2 



Cell processes 


432 


3.802 ± 


0.088 


0.951 ± 


0.052 


2.742 ± 


0.031 


Transport 


667 


3.775 ± 


0.070 


0.999 ± 


0.041 


2.838 ± 


0.051 


Information 


773 


3.746 ± 


0.067 


0.972 ± 


0.040 


2.715± 


0.047 


transfer 
















Metabolism 


1,596 


3.629 ± 


0.046 


1.018± 


0.027 


2.765 ± 


0.043 


Regulation 


460 


3.561 ± 


0.085 


1.012± 


0.052 


2.644 ± 


0.048 



a Native: mean HSR density in native sequences. Control 1: mean HSR density 
in random sequences generated by shuffle mononucleotides. Control 2: mean HSR 
density in random sequences generated by shuffle all synonymous codons among 
sites with identical amino acids, whereas maintaining codon usage, GC content, 
and amino acid sequence. 



By contrast, the mean SCR value of rHSR was approximately 
0.42. The result that HSRs exhibited a significantly 
higher degree of SCR than the random (paired f-test, 
P<2.2 x 1CT 16 ) rejected the null hypothesis and suggested 
a direct selective constraint on HSRs for high structural 
stability. 

Functions of HSRs 
HSRs Block Translation 

Variation in the local translation rate partly regulated by the 
mRNA secondary structure affects the protein folding (Komar 
2009; Zhang et al. 2009) and the localization of protein 
(Mariappan et al. 2010) or mRNA (Yanagitani et al. 201 1). It 
is of interest to study the correlation between the local trans- 
lation rate and mRNA structure. The recent findings of Li et al. 
(2012) reporting genome-wide measurements of ribosome 
occupancy at a resolution of single nucleotides for E. coli 
may enable us to compare the mean ribosome speed in 
HSRs with that in other regions. Surprisingly, we found that 
the ribosome density in HSRs was significantly lower than that 
in FARs (1 .007 in FARs, 0.956 in HSRs, paired f-test, P< 10~ 5 , 
fig. 6). By checking the ribosome density in HSRs, we found a 
notable reduction near the 40 nt downstream of the start 
position of HSRs (fig. 6). No obvious ribosome accumulation 
was observed in either FARs or HSRs. The results seemed to 
contradict the fact that the secondary structure slows down 
translation elongation (Wen et al. 2008; Tholstrup et al. 201 2), 
and that higher ribosome density would be observed in the 
HSRs. A possible explanation for the results is that the ribo- 
some density on mRNA is typically too low, and ribosomes are 
blocked at the start position of HSRs. To validate this explana- 
tion, we repeated the earlier mentioned processes using genes 
with expression levels at the top 30%. As expected, we found 
two regions with obvious ribosome accumulation near the 
start of HSRs (all P values < 10~ 5 , fig. 6). Taken together, 
we concluded that HSRs blocked translation elongation and 
that local translation efficiency was regulated by HSRs, at least 
partly. 



Significant Difference in HSR Density among Different 
Gene Categories 

mRNA secondary structures influence a number of cotransla- 
tional processes by modulating the local translation rate. The 
proper functions of gene are partly dependent on these reg- 
ulation processes. Therefore, it is possible that HSRs have 
effect on gene functions. To test this hypothesis, genes 
were classified into five categories based on the GenProtEC 
database (Serres et al. 2004), and Kruskal-Wallis analysis of 
variance was performed among these categories. A weak but 
significant difference (P= 0.029, table 1 and supplementary 
table S2, Supplementary Material online) in HSR density 
among the five categories was found. Genes involved in cell 
processes and transport exhibited a higher HSR density, 
whereas regulation genes showed a lower value. By contrast, 
no remarkable difference was observed in the control (P > 0. 1 , 
table 1). Additionally, we also compared rHSR densities 
among gene categories and found a different pattern. 
Metabolism and transport genes showed higher rHSR density 
than other genes (table 1). Although regulation genes also 
had the lowest rHSR density, the mean ratio (rHSR density in 
cell processes/rHSR density in regulation) is significantly lower 
than that based on HSR density (1 .037 on the average vs. 
1.071, Wilcoxon test, P= 0.00085). Overall, the results indi- 
cated that genes involved in different processes had different 
HSR densities. 

For details, GO analyses were performed using the web 
application DAVID (Da Wei Huang and Lempicki 2008). 
Genes with HSR density at the top and bottom 30% were 
extracted, and the most enriched processes of the two groups 
were compared. We found that genes in the top group were 
enriched in the processes of cell division, cell morphogenesis, 
protein folding, and translation (fig. 7A), whereas genes in the 
bottom group were involved in fermentation and several pro- 
cesses related to biosynthesis (fig. IB). Similar results were 
observed when conserved HSRs were used (supplementary 
fig. S7, Supplementary Material online). Moreover, the results 
remained unchanged when examining different thresholds 
(supplementary figs. S8 and S9, Supplementary Material 
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Fig. 7. — The most enriched terms between the top (A) and bottom (B) groups of HSR density. Some identical terms in two groups are not shown. 



online). Overall, these results suggested that the HSR density 
was associated with gene functions, and that genes involved 
in different biological processes had different number of HSRs. 

Significant Association between HSR Density and Gene 
Essentiality 

Essential genes are those indispensable for the survival of an 
organism and are thus considered as foundations of life. 
Previous studies have proposed a series of features to distin- 
guish essential and nonessential genes, such as protein con- 
nectivity (Arifuzzaman et al. 2006) and gene expression 
(Jeonga et al. 2003). Here, we found that essential genes 
have significantly higher HSR density than nonessential 
genes (4.05 in essential vs. 3.60 in nonessential, Wilcoxon 
test, P= 0.0001 7), whereas the difference was not obvious 
in the control (0.947 in essential vs. 0.959 in nonessential, 
Wilcoxon test, P=0.86). Note that a difference in protein 
connectivity between essential and nonessential genes has 
been reported (Arifuzzaman et al. 2006), we investigated 
the correlation between HSR density and connectivity. 
Although no significant correlation was observed, we found 
a weak but significant difference in HSR density between the 
genes with connectivity at the top 30% and bottom 30% 
(3.49 in bottom vs. 3.75 in top, Wilcoxon test, P= 0.0067), 
whereas no significant difference was found in the control 
(0.94 at the bottom vs. 0.95 at the top, Wilcoxon test, 
P=0.81). Again, to exclude the effect of codon usage and 
amino acid bias, we calculated the ratio D density (D density = HSR 



density in essential genes/HSR density in nonessential genes) 
and compared D density with that in random sequence 
(^density)- We found that D den tisy is significantly higher than 
^density (1-12 in D density vs. 1.02 on the average of rD density , 
Wilcoxon test, P=9.5 x 10~ 7 ). Overall, all these findings re- 
vealed that there was an association between HSR density and 
gene essentiality, and that genes with high HSR density 
tended to have high connections with other genes. 

Neither Protein Abundance nor mRNA Decay Attributes 
to the Variation in HSR Density 

Gene functions are generally accepted to be associated with 
mRNA decay (Bernstein et al. 2002) and expression level (Barry 
et al. 2005; Lukk et al. 2010), which are partly regulated by 
the structural stability of mRNA (Kudla et al. 2009; Novikova 
et al. 2012; Tani et al. 2012; Zur and Tuller 2012). Thus, the 
difference in HSR density among gene categories might be an 
artifact arising from the difference in mRNA decay or expres- 
sion level. To rule out this possibility, we calculated the corre- 
lations between HSR density, protein abundance, and mRNA 
half-life. The results showed that there was no significant cor- 
relation between HSR density and protein abundance 
(Spearman correlation, /? = 0.016, N= 1,543, P=0.52), as 
well as between HSR density and mRNA half-life (Spearman 
correlation, /? = 0.03, N = 2,552, P=0.1 1). Moreover, we ex- 
tracted genes with expression levels at top 30% and bottom 
30%. HSR densities of the two groups were compared, and 
again no significant difference was found (3.74 at top, 3.67 at 
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bottom, A/ = 462, P=0.56). These results indicated that the 
difference in HSR density among gene categories was not 
connected with the variation in protein abundance and 
mRNA decay. 

Discussion 

By surveying secondary structures in various genomes, previ- 
ous studies have revealed that a large number of genomes are 
being transcribed to produce mRNAs that generally contain 
local secondary structures (Katz and Burge 2003; Meyer and 
Miklos 2005; Kertesz et al. 201 0). Most studies focused on the 
structural conservation of functional elements. In this study, 
however, we showed that not all functional elements within 
mRNAs were structurally conserved. The elements might 
maintain high structural stability rather than structural specifi- 
city during evolution. This finding led us to survey other re- 
gions with high structural stability. As expected, we found that 
CDSs exhibited substantially high HSR levels compared with 
the control. In particular, we found that HSRs tended to be 
located in the 5' or 3' coding region, which was consistent 
with the findings of Tuller et al. (201 1). In addition, previous 
studies have shown that there is a universally reduced struc- 
tural stability near the start codon (Gu et al. 2010). However, 
our results suggested that a few genes (~6%, fig. 4) still 
maintained higher structural stability in this region during evo- 
lution, and that these genes might be under selection for low 
initiation efficiency. 

Although we defined HSRs using low z-score value, it did 
not ensure that there is a direct selective constraint imposed 
on HSRs to retain high structural stability. Thus, we calculated 
the SCR of HSRs. Our results indicated that HSRs were not the 
by-product of selection for other factors and HSRs might be 
functional. This result led us to investigate the correlation be- 
tween the number of HSRs and gene functions. GO analysis 
showed that genes with similar HSR density tended to be en- 
riched in the same biological processes. We also found that 
essential genes showed higher HSR density than nonessential 
genes. Furthermore, we have ascertained that these results 
remained unchanged under various controls, in various organ- 
isms (supplementary figs. S4-S9, Supplementary Material 
online). Additionally, the results still held even after excluding 
the HSRs that contained the conserved secondary structures 
(supplementary fig. S10, Supplementary Material online). 

We further showed that the number variation of HSRs was 
not connected with protein abundance and mRNA decay. Our 
result seemly contradicted the findings in previous study that 
reported a positive correlation between the structural stability 
of mRNA and protein abundance (Zur and Tuller 2012). To 
explain this difference, we calculated the correlation between 
the mean structural stability of CDSs and protein abundance. 
Indeed, we found a weak but significant correlation 
(/?=-0.08, N= 1,543, P<0.001), which was consistent 
with the claim that genes with high structural stability had 



high protein abundance (Zur and Tuller 2012). However, 
HSRs were small regions on mRNA (that is, the regions cov- 
ered by all HSRs of a CDS were ~30%, on average), and more 
than 94% HSRs were located in the regions downstream of 
the 90 nt (fig. 4). Thus, the number variation of HSRs might 
affect local elongation efficiency but had weak effect on 
whole translation efficiency. Moreover, we focused on pro- 
karyotes and investigated the correlation between the 
number of HSRs and protein abundance, which is different 
from the Zur's work. 

How does the number of HSRs contribute to gene func- 
tions or gene essentiality? One possibility is that it affects 
posttranslational modification of the nascent polypeptide by 
changing the translation speed as summarized by Shabalina 
et al. (2013). An example of posttranslational modification 
involving mRNA secondary structure has been shown for 
actins (Zhang et al. 2010). mRNA encoding gamma-actin 
forms a stable structure near the translation initiation site, re- 
sulting in a significant reduction in the translation speed. 
Although this reduction does not significantly affect the over- 
all protein abundance, it leads to a slower folding of 
gamma-actin due to ribosome pausing and thus makes it vul- 
nerable to ubiquitin conjugation machinery attracted by 
cotranslational arginylation. Another possibility is that HSRs 
are connected with the regulation of protein folding. It has 
been revealed that HSRs lead to ribosome pausing, which may 
have drastic effects on the folding efficiency of newly synthe- 
sized proteins (Jia et al. 2004; Watts et al. 2009). Overall, HSRs 
might influence gene functions by changing the local transla- 
tion rate. 

This analysis has several limitations. First, we used a sliding 
window to define the HSRs, and the boundary of HSR was 
difficult to determine. Second, we focused on the correlation 
between the number of HSRs and biological processes and did 
not determine whether all HSRs are functional. Third, al- 
though we showed that HSRs blocked the ribosome speed, 
we failed to show the evidence that HSRs influence gene 
functions by changing the ribosome speed. The mechanisms 
underlying the association between HSRs and gene function 
are worth pursuing at a deeper level. 

Supplementary Material 

Supplementary figures S1-S10 and tables S1 and S2 are avail- 
able at Genome Biology and Evolution online (http:/A/wwv. 
gbe.oxfordjournals.org/). 
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